In this exercise, we will be using functions from the tidyverse package and the broom package. More attractive tables like those seen in lectures can be produced using gt.

library(tidyverse)
library(broom)
library(gt)

(a) Comparing number of trips for those with and without BRM

We’ve seen the data in airport_screening.csv in the lectures. We also noticed there seemed to be a relationship between number of trips and tendency to bring in Biosecurity Risk Material (BRM), but we haven’t assessed it formally.

Load the data set using screening <- read_csv("airport_screening.csv") and use the t.test function to compare the average number of trips between those people with BRM and those without. Recall that we use the y ~ x model syntax for two-sample t-tests like this (the data are not paired).

Extract out the P-value by creating an object using tidy(). You could even try formatting it automatically using the scales package as demonstrated in lectures.

screening <- read_csv("airport_screening.csv")
Rows: 200 Columns: 24
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (6): sex, passenger_crew, arrival_port, airline, check_in_port, passpor...
dbl (18): BRM, age, month, year, period_stay, number_trips, eggs, other_flow...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
t.test(number_trips ~ BRM, data = screening)

    Welch Two Sample t-test

data:  number_trips by BRM
t = 4.281, df = 189.34, p-value = 2.954e-05
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 15.68899 42.50207
sample estimates:
mean in group 0 mean in group 1 
       39.26220        10.16667 
test.result <- tidy(t.test(number_trips ~ BRM, data = screening))
test.result$p.value
[1] 2.953688e-05
scales::pvalue(test.result$p.value)
[1] "<0.001"

(b) Looking at the association between passenger/crew, passport country and BRM

Use a Fisher’s exact test (fisher.test()) to compare the rates of BRM for passengers compared to crew.

Use a Chi-squared test (chisq.test()) to compare the rates of BRM for the four passport countries.

Use the group_by and summarise with mean functions to print the actual proportions of BRM in the groups, following the example in lectures, to help understand these results.

fisher.test(screening$passenger_crew, screening$BRM)

    Fisher's Exact Test for Count Data

data:  screening$passenger_crew and screening$BRM
p-value = 0.355
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
 0.3744039       Inf
sample estimates:
odds ratio 
       Inf 
chisq.test(screening$passport_country, screening$BRM)
Warning in chisq.test(screening$passport_country, screening$BRM): Chi-squared
approximation may be incorrect

    Pearson's Chi-squared test

data:  screening$passport_country and screening$BRM
X-squared = 30.605, df = 3, p-value = 1.029e-06
screening %>%
  group_by(passenger_crew) %>%
  summarise(mean(BRM))
# A tibble: 2 × 2
  passenger_crew `mean(BRM)`
  <chr>                <dbl>
1 C                    0    
2 P                    0.188
screening %>%
  group_by(passport_country) %>%
  summarise(mean(BRM))
# A tibble: 4 × 2
  passport_country `mean(BRM)`
  <chr>                  <dbl>
1 A                     0.0244
2 B                     0.145 
3 C                     0.4   
4 D                     0.52  

© 2021 Statistical Consulting Centre, The University of Melbourne.